Universidad Autónoma de Madrid. Madrid (Spain)
July 26th, 2021
Summary:
Some optimization problems require the evaluation of an expensive objective function, whether in economical, computational time or other resources. The analytical expression of the objective function may be unknown. Without an analytical expression, gradients are not accessible and hence are not available for optimization procedures. The evaluation of the objective can also be corrupted by noise. In this case, for the same value of the parameters, the evaluation will return different values. The functions that have the previous characteristics are defined as black-boxes. An example of a black-box function is estimating the generalization error of a machine learning algorithm in terms of its hyper-parameters. Optimizing black-boxes is a task that has recently gained special importance in the machine learning community. Bayesian optimization (BO) is a set of methods that has been successfully applied for the optimization of black-boxes. BO methods are generally used when the budget of evaluations of the objective function is limited. They deliver great results as they think carefully about which is the best possible next evaluation of the desired objective to be optimized. In order to do so, BO methods rely on a probabilistic model of the objective function. This model generates a predictive distribution of the objective at each input location, that captures the uncertainty about the potential values of the objective. This information is used by BO methods to guide the optimization process. The key for success is that computing the predictive distribution is very cheap compared with evaluating the objective. The probabilistic model is often a Gaussian process (GP) since its predictive distribution can be easily computed. This thesis proposes several methods to extend the applicability of BO to a broader scope of scenarios. One of these settings involves the simultaneous optimization of multiple objectives under several constraints. For example, the optimization of the generalization error of a deep neural network and its prediction time, with respect to its hyper-parameters, under the constraint that the associated energy consumption on a mobile device is below a specific value. The objectives are conflicting since an accurate network will be large and hence require long prediction times. The solution of a multi-objective problem is a set of points that show the best trade-off for all the objectives. This set is called the Pareto set. Of course, these points must be feasible, i.e., they must satisfy all the constraints. This thesis describes a BO method that uses information theory to solve constrained multi-objective problems. This method, called PESMOC, has state-of-the-art results in this task. In some practical situations we may have the possibility of evaluating the objective of the problem, at several points, simultaneously. For example, when a cluster of computers is available. Most BO methods are, however, sequential. Therefore, they can may not use all the available computational nodes during the optimization process, resulting in a waste of resources. In this thesis we propose an extension of PESMOC that, at each iteration, suggests a batch of points to be evaluated in parallel. Our results show that such a method, called PPESMOC, improves the results of simple extensions to the parallel setting of PESMOC and other related methods. One issue of GPs is that they assume real-valued input variables. However, real problems also involve integer-valued and categorical variables. For example, the hyper-parameters of a deep neural network may involve, besides the learning rate, the number of layers, which is an integer-valued variable, and the activation function, which is a categorical variable. This thesis proposes a transformation of the covariance function of a GP to deal simultaneously with integer, categorical and real variables. Our experiments show that the use of this transformation leads to better results in BO methods that rely on using GPs as the probabilistic model. Finally, this thesis illustrates the use of BO methods to solve a real problem involving robust ocean wave features prediction. With this goal, BO methods are used to tune the hyper-parameters a hybrid Grouping Genetic Algorithm for attribute selection combined with an Extreme Learning Machine (GGA-ELM) for prediction. The proposed BO methodology has been tested in a real problem involving buoys data from the Western coast of the USA. The results obtained show that BO methods outperform a uniform search strategy and the hyper-parameter configuration specified by a human expert.
Citation:
E.C. Garrido-Merchán (2021), Advanced methods for Bayesian optimization in complex scenarios. Universidad Autónoma de Madrid. Madrid (Spain).